translation rule
Hierarchical Phrase-based Sequence-to-Sequence Learning
Wang, Bailin, Titov, Ivan, Andreas, Jacob, Kim, Yoon
We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference. Our approach trains two models: a discriminative parser based on a bracketing transduction grammar whose derivation tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one. We use the same seq2seq model to translate at all phrase scales, which results in two inference modes: one mode in which the parser is discarded and only the seq2seq component is used at the sequence-level, and another in which the parser is combined with the seq2seq model. Decoding in the latter mode is done with the cube-pruned CKY algorithm, which is more involved but can make use of new translation rules during inference. We formalize our model as a source-conditioned synchronous grammar and develop an efficient variational inference algorithm for training. When applied on top of both randomly initialized and pretrained seq2seq models, we find that both inference modes performs well compared to baselines on small scale machine translation benchmarks.
Dependency Graph-to-String Statistical Machine Translation
Li, Liangyou, Way, Andy, Liu, Qun
We present graph-based translation models which translate source graphs into target strings. Source graphs are constructed from dependency trees with extra links so that non-syntactic phrases are connected. Inspired by phrase-based models, we first introduce a translation model which segments a graph into a sequence of disjoint subgraphs and generates a translation by combining subgraph translations left-to-right using beam search. However, similar to phrase-based models, this model is weak at phrase reordering. Therefore, we further introduce a model based on a synchronous node replacement grammar which learns recursive translation rules. We provide two implementations of the model with different restrictions so that source graphs can be parsed efficiently. Experiments on Chinese--English and German--English show that our graph-based models are significantly better than corresponding sequence- and tree-based baselines.
LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models
Zhou, Yuan, Gram-Hansen, Bradley J., Kohn, Tobias, Rainforth, Tom, Yang, Hongseok, Wood, Frank
We develop a new Low-level, First-order Probabilistic Programming Language (LF-PPL) suited for models containing a mix of continuous, discrete, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for boundary crossings. This enables the introduction of new inference engines that are able to exploit gradient information, while remaining efficient for models which are not everywhere differentiable. We demonstrate this ability by incorporating a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this language has a density with measure zero discontinuities to maintain the validity of the inference engine.
A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly
Huang, Guoping (Chinese Academy of Sciences) | Zhang, Jiajun (Chinese Academy of Sciences) | Zhou, Yu (Chinese Academy of Sciences) | Zong, Chengqing (Chinese Academy of Sciences)
Computer-aided translation (CAT) system is the most popular tool which helps human translators perform language translation efficiently. To further improve the efficiency, there is an increasing interest in applying the machine translation (MT) technology to upgrade CAT. Post-editing is a standard approach: human translators generate the translation by correcting MT outputs. In this paper, we propose a novel approach deeply integrating MT into CAT systems: a well-designed input method which makes full use of the knowledge adopted by MT systems, such as translation rules, decoding hypotheses and n-best translation lists. Our proposed approach allows human translators to focus on choosing better translation results with less time rather than just complete translation themselves. The extensive experiments demonstrate that our method saves more than 14% time and over 33% keystrokes, and it improves the translation quality as well by more than 3 absolute BLEU scores compared with the strong baseline, i.e., post-editing using Google Pinyin.
Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach
Baker, Kathryn, Bloodgood, Michael, Callison-Burch, Chris, Dorr, Bonnie J., Filardo, Nathaniel W., Levin, Lori, Miller, Scott, Piatko, Christine
We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reported on the NIST 2009 Urdu-English translation task. This finding supports the hypothesis (posed by many researchers in the MT community, e.g., in DARPA GALE) that both syntactic and semantic information are critical for improving translation quality---and further demonstrates that large gains can be achieved for low-resource languages with different word order than English.
Mind the Gap: Machine Translation by Minimizing the Semantic Gap in Embedding Space
Zhang, Jiajun (Chinese Academy of Sciences) | Liu, Shujie (Microsoft Research Asia) | Li, Mu (Microsoft Research Asia) | Zhou, Ming (Microsoft Research Asia) | Zong, Chengqing (Chinese Academy of Sciences)
The conventional statistical machine translation (SMT) models, such as phrase-based models (Koehn et al. 2007), formal syntax-based models (Chiang 2007; Xiong, Liu, and Aiming at retaining the semantic meaning during the Lin 2006) and linguistically syntax-based models (Liu, Liu, translation process, we propose a Recursive Neural Network and Lin 2006; Huang, Knight, and Joshi 2006; Galley et al. (RNN) based translation model. Like the previous SMT 2006; Zhang et al. 2008), perform the decoding process and models, the RNN-based model induces the translation rules generate the translation result by compositing a set of translation from the bitexts. Unlike them, the RNN-based model learns rules which are associated with high probabilities. The how to represent each lexical translation rule with two compact probabilities of the translation rules (e.g. the phrasal translation semantic vectors, and learns how to perform decoding probabilities and the lexical weights in phrase-based using the merging type (swap or monotone) dependent recursive and formal syntax-based models) are all computed based on neural networks that attempt to find the best translation the cooccurrence statistics of the rule's source-and targetsides candidate having the minimal semantic gap with the source in the bilingual corpus.
Topic-Based Dissimilarity and Sensitivity Models for Translation Rule Selection
Zhang, M., Xiao, X., Xiong, D., Liu, Q.
Translation rule selection is a task of selecting appropriate translation rules for an ambiguous source-language segment. As translation ambiguities are pervasive in statistical machine translation, we introduce two topic-based models for translation rule selection which incorporates global topic information into translation disambiguation. We associate each synchronous translation rule with source- and target-side topic distributions.With these topic distributions, we propose a topic dissimilarity model to select desirable (less dissimilar) rules by imposing penalties for rules with a large value of dissimilarity of their topic distributions to those of given documents. In order to encourage the use of non-topic specific translation rules, we also present a topic sensitivity model to balance translation rule selection between generic rules and topic-specific rules. Furthermore, we project target-side topic distributions onto the source-side topic model space so that we can benefit from topic information of both the source and target language. We integrate the proposed topic dissimilarity and sensitivity model into hierarchical phrase-based machine translation for synchronous translation rule selection. Experiments show that our topic-based translation rule selection model can substantially improve translation quality.
Fuzzy sets as a basis for a theory of possibility
The theory of possibility described in this paper is related to the theory of fuzzy sets by defining the concept of a possibility distribution as a fuzzy restriction which acts as an elastic constraint on the values that may be assigned to a variable. More specifically, if F is a fuzzy subset of a universe of discourse U={u} which is characterized by its membership function μF, then a proposition of the form “X is F,” where X is a variable taking values in U, induces a possibility distribution ∏X which equates the possibility of X taking the value u to μF(u)—the compatibility of u with F. In this way, X becomes a fuzzy variable which is associated with the possibility distribution ∏x in much the same way as a random variable is associated with a probability distribution. In general, a variable may be associated both with a possibility distribution and a probability distribution, with the weak connection between the two expressed as the possibility/probability consistency principle. A thesis advanced in this paper is that the imprecision that is intrinsic in natural languages is, in the main, possibilistic rather than probabilistic in nature. Thus, by employing the concept of a possibility distribution, a proposition, p, in a natural language may be translated into a procedure which computes the probability distribution of a set of attributes which are implied by p. Several types of conditional translation rules are discussed and, in particular, a translation rule for propositions of the form “X is F is α-possible,” where α is a number in the interval [0, 1], is formulated and illustrated by examples.